Fashion Focus: Multi-modal Retrieval System for Video Commodity Localization in E-commerce

نویسندگان

چکیده

Nowadays, live-stream and short video shopping in E-commerce have grown exponentially. However, the sellers are required to manually match images of selling products timestamp exhibition untrimmed video, resulting a complicated process. To solve problem, we present an innovative demonstration multi-modal retrieval system called ``Fashion Focus'', which enables exactly localize product online as focuses. Different modality contributes community localization, including visual content, linguistic features interaction context jointly investigated via presented learning. Our employs two procedures for analysis, content structuring retrieval, automatically achieve accurate video-to-shop matching. Fashion Focus presents unified framework that can orientate consumers towards relevant exhibitions during watching videos help effectively deliver over search recommendation.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multi-Modal Fashion Product Retrieval

Finding a product in the fashion world can be a daunting task. Everyday, e-commerce sites are updating with thousands of images and their associated metadata (textual information), deepening the problem. In this paper, we leverage both the images and textual metadata and propose a joint multi-modal embedding that maps both the text and images into a common latent space. Distances in the latent ...

متن کامل

Multi-modal query expansion for video object instances retrieval

In this paper we tackle the issue of object instances retrieval in video repositories using minimum information from the user (e.g., textual description/tags). Starting for a set of tags, images containing the object of interest are crawled from popular image search engines and repositories (e.g., Bing, Fickr, Google) and the positive and most representative instances of the object are automati...

متن کامل

Multi-modal Classifier Fusion for Video Shot Content Retrieval

In this paper we present a new chromosome to solve the problem of classifier fusion using genetic algorithm. Experiments are conducted in the context of TRECVID. In particular we focus on the feature extraction task that consists in retrieving video shots expressing one of predefined semantic concepts. Three modalities (visual, textual and motion) and two features per modality are used to descr...

متن کامل

A multi-modal system for the retrieval of semantic video events

A framework for event detection is proposed where events, objects, and other semantic concepts are detected from video using trained classifiers. These classifiers are used to automatically annotate video with semantic labels, which in turn are used to search for new, untrained types of events and semantic concepts. The novelty of the approach lies in the: (1) semi-automatic construction of mod...

متن کامل

Billion-scale Commodity Embedding for E-commerce Recommendation in Alibaba

Recommender systems (RSs) have been the most important technology for increasing the business in Taobao, the largest online consumer-to-consumer (C2C) platform in China. The billionscale data in Taobao creates three major challenges to Taobao’s RS: scalability, sparsity and cold start. In this paper, we present our technical solutions to address these three challenges. The methods are based on ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence

سال: 2021

ISSN: ['2159-5399', '2374-3468']

DOI: https://doi.org/10.1609/aaai.v35i18.18033